import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import root_mean_squared_error, r2_score, accuracy_score, precision_score, recall_score, f1_score, precision_recall_curve
from xgboost import XGBRegressor, XGBClassifier
import seaborn as sns
import plotly
from skopt import BayesSearchCV
from skopt.space import Real, Categorical, Integer
from skopt.plots import plot_objective, plot_histogram, plot_convergence
import warnings
from IPython import display12 Smarter Hyperparameter Optimization for Tree-Based Models
Tree-based models, like decision trees, random forests, and boosting algorithms (e.g., XGBoost, LightGBM, CatBoost), benefit significantly from hyperparameter optimization. Tools like GridSearchCV, RandomizedSearchCV, and cross_val_score in scikit-learn enable robust tuning via cross-validation, similar to linear or logistic regression models. Below, we explore these tools and smarter alternatives for complex models.
12.1 Cross-Validation Basics
cross_val_score: Evaluates a model’s performance for a fixed set of hyperparameters using cross-validation. Requires manual loops to search over hyperparameter combinations, making it labor-intensive for extensive tuning.GridSearchCVandRandomizedSearchCV: Automate hyperparameter search with built-in cross-validation, internally handling loops to evaluate combinations efficiently.
12.2 GridSearchCV: Exhaustive but Limited
GridSearchCV performs an exhaustive search over a predefined grid of hyperparameter values.
- Pros:
- Guarantees finding the best combination within the specified grid.
- Ideal for small, well-defined search spaces (e.g., tuning
max_depth=[3, 5, 7]andn_estimators=[100, 200]).
- Cons:
- Requires discrete lists for each hyperparameter, unable to sample from continuous distributions.
- Grid size grows exponentially with more hyperparameters, making it computationally infeasible for complex models like boosting trees (e.g., XGBoost with 5+ hyperparameters).
- Time-consuming, especially for large datasets or slow models.
Use Case: Best for simple models or when you have a small, targeted set of hyperparameter values.
12.3 RandomizedSearchCV: Efficient but Random
RandomizedSearchCV samples a fixed number of hyperparameter combinations (n_iter) randomly from user-defined distributions.
- Pros:
More efficient than
GridSearchCVfor large search spaces, as it evaluates fewer combinations.Supports continuous distributions (e.g.,
scipy.stats.loguniformfor learning rate), offering greater flexibility.from scipy.stats import uniform 'learning_rate': uniform(loc=0.01, scale=0.3), # Range: [0.01, 0.31)Faster, as it avoids exhaustive enumeration.
- Cons:
- Random sampling may miss optimal regions, especially in high-dimensional spaces or with limited iterations.
- Performance depends on the choice of
n_iterand distribution quality.
Use Case: Preferred for initial exploration or when computational resources are limited but the search space is large.
12.4 Challenges with Boosting Models
Boosting models (e.g., Gradient Boosting, XGBoost, LightGBM) are powerful but have many hyperparameters (e.g., learning rate, max depth, number of estimators, regularization terms). This complexity makes GridSearchCV impractical and RandomizedSearchCV suboptimal, as random sampling can be inefficient in high-dimensional spaces.
12.5 Smarter Alternatives for Complex Models
For boosting models, neural networks, or other high-dimensional algorithms, intelligent optimization methods outperform traditional approaches by efficiently exploring the hyperparameter space. Here are the top alternatives:
12.5.1 1. BayesSearchCV (Bayesian Optimization)
How it works: Uses a probabilistic surrogate model (e.g., Gaussian Process) to predict promising hyperparameter combinations based on past evaluations.
Advantages:
- Converges faster than random search by focusing on high-performing regions.
- It works natively with scikit-learn pipelines, simplifying tuning for workflows with preprocessing
- GP (Gassian Process) - based optimization is effective for smooth, continuous hyperparameters (e.g., learning rate, regularization), converging faster in low-dimensional spaces (Tuning 2-5 continuous parameters).
- Ideal for produnction environment restricted to scikit-learn extensions
Library:
scikit-optimizeorbayes_opt.Use Case: Ideal for small to medium search spaces with continuous parameters on small to medium-sized datasets.
# Load the dataset
car = pd.read_csv('Datasets/car.csv')
car.head()| brand | model | year | transmission | mileage | fuelType | tax | mpg | engineSize | price | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | vw | Beetle | 2014 | Manual | 55457 | Diesel | 30 | 65.3266 | 1.6 | 7490 |
| 1 | vauxhall | GTC | 2017 | Manual | 15630 | Petrol | 145 | 47.2049 | 1.4 | 10998 |
| 2 | merc | G Class | 2012 | Automatic | 43000 | Diesel | 570 | 25.1172 | 3.0 | 44990 |
| 3 | audi | RS5 | 2019 | Automatic | 10 | Petrol | 145 | 30.5593 | 2.9 | 51990 |
| 4 | merc | X-CLASS | 2018 | Automatic | 14000 | Diesel | 240 | 35.7168 | 2.3 | 28990 |
X = car.drop(columns=['price'])
y = car['price']
# extract the categorical columns and put them in a list
categorical_feature = X.select_dtypes(include=['object']).columns.tolist()
# extract the numerical columns and put them in a list
numerical_feature = X.select_dtypes(include=['int64', 'float64']).columns.tolist()
# convert the categorical columns to category type
for col in categorical_feature:
X[col] = X[col].astype('category')
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# build a baseline xgboost model
xgb = XGBRegressor(objective='reg:squarederror', enable_categorical=True, random_state=42)
# fit the model
xgb.fit(X_train, y_train, )
# make predictions
y_pred = xgb.predict(X_test)
print ('Baseline Model: ')
# calculate the RMSE
rmse = root_mean_squared_error(y_test, y_pred)
print(f'RMSE: {rmse}')
# calculate the R2 score
r2 = r2_score(y_test, y_pred)
print(f'R2: {r2}')Baseline Model:
RMSE: 3299.648193359375
R2: 0.9628884792327881
Let’s use BayesSearchCV to boost its performance
%%time
# define the search space for Bayesian optimization
search_space = {
'n_estimators': Integer(50, 500),
'max_depth': Integer(5, 30),
'learning_rate': Real(0.01, 0.3, prior='uniform'),
'subsample': Real(0.5, 1.0, prior='uniform'),
'colsample_bytree': Real(0.5, 1.0, prior='uniform'),
'gamma': Real(0, 5, prior='uniform'),
}
# define the model
xgb = XGBRegressor(objective='reg:squarederror', enable_categorical=True, random_state=42)
# define the Bayesian optimization search
bayes_search = BayesSearchCV(
xgb,
search_space,
n_iter=50,
scoring='neg_root_mean_squared_error',
cv=3,
n_jobs=-1,
random_state=42,
verbose=0
)
# fit the model
bayes_search.fit(X_train, y_train)CPU times: total: 46.6 s
Wall time: 3min 24s
BayesSearchCV(cv=3,
estimator=XGBRegressor(base_score=None, booster=None,
callbacks=None, colsample_bylevel=None,
colsample_bynode=None,
colsample_bytree=None, device=None,
early_stopping_rounds=None,
enable_categorical=True, eval_metric=None,
feature_types=None, feature_weights=None,
gamma=None, grow_policy=None,
importance_type=None,
interaction_constraints=None...
'gamma': Real(low=0, high=5, prior='uniform', transform='normalize'),
'learning_rate': Real(low=0.01, high=0.3, prior='uniform', transform='normalize'),
'max_depth': Integer(low=5, high=30, prior='uniform', transform='normalize'),
'n_estimators': Integer(low=50, high=500, prior='uniform', transform='normalize'),
'subsample': Real(low=0.5, high=1.0, prior='uniform', transform='normalize')})In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
BayesSearchCV(cv=3,
estimator=XGBRegressor(base_score=None, booster=None,
callbacks=None, colsample_bylevel=None,
colsample_bynode=None,
colsample_bytree=None, device=None,
early_stopping_rounds=None,
enable_categorical=True, eval_metric=None,
feature_types=None, feature_weights=None,
gamma=None, grow_policy=None,
importance_type=None,
interaction_constraints=None...
'gamma': Real(low=0, high=5, prior='uniform', transform='normalize'),
'learning_rate': Real(low=0.01, high=0.3, prior='uniform', transform='normalize'),
'max_depth': Integer(low=5, high=30, prior='uniform', transform='normalize'),
'n_estimators': Integer(low=50, high=500, prior='uniform', transform='normalize'),
'subsample': Real(low=0.5, high=1.0, prior='uniform', transform='normalize')})XGBRegressor(base_score=None, booster=None, callbacks=None,
colsample_bylevel=None, colsample_bynode=None,
colsample_bytree=0.7800962787418171, device=None,
early_stopping_rounds=None, enable_categorical=True,
eval_metric=None, feature_types=None, feature_weights=None,
gamma=5.0, grow_policy=None, importance_type=None,
interaction_constraints=None, learning_rate=0.03253670126492928,
max_bin=None, max_cat_threshold=None, max_cat_to_onehot=None,
max_delta_step=None, max_depth=5, max_leaves=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
multi_strategy=None, n_estimators=342, n_jobs=None,
num_parallel_tree=None, ...)XGBRegressor(base_score=None, booster=None, callbacks=None,
colsample_bylevel=None, colsample_bynode=None,
colsample_bytree=0.7800962787418171, device=None,
early_stopping_rounds=None, enable_categorical=True,
eval_metric=None, feature_types=None, feature_weights=None,
gamma=5.0, grow_policy=None, importance_type=None,
interaction_constraints=None, learning_rate=0.03253670126492928,
max_bin=None, max_cat_threshold=None, max_cat_to_onehot=None,
max_delta_step=None, max_depth=5, max_leaves=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
multi_strategy=None, n_estimators=342, n_jobs=None,
num_parallel_tree=None, ...)print('Bayesian Optimization Results: ')
# get the best parameters
best_params_bayes = bayes_search.best_params_
print('Best Parameters: ')
print(best_params_bayes)
# get the best score
best_score_bayes = bayes_search.best_score_
print('Best CV Score: ', best_score_bayes)
# make predictions
y_pred_bayes = bayes_search.predict(X_test)
# calculate the RMSE
rmse_bayes = root_mean_squared_error(y_test, y_pred_bayes)
print('Test RMSE: ', rmse_bayes)
# calculate the R2 score
r2_bayes = r2_score(y_test, y_pred_bayes)
print('Test R2: ', r2_bayes)
Bayesian Optimization Results:
Best Parameters:
OrderedDict({'colsample_bytree': 0.7800962787418171, 'gamma': 5.0, 'learning_rate': 0.03253670126492928, 'max_depth': 5, 'n_estimators': 342, 'subsample': 0.5})
Best CV Score: -3061.9464518229165
Test RMSE: 3236.476318359375
Test R2: 0.9642958641052246
12.5.2 Visualizing BayesSearchCV Results
Visualizing BayesSearchCV outcomes helps interpret the hyperparameter optimization process for tree-based models like XGBoost. Visualization can provide critical insights into convergence, parameter impact, and model behavior, helping you understand and refine model behavior.
Key Benefits of Visualization:
Check Convergence:
Plot best scores vs. iterations to determine if the search has stabilized or requires more trials.Identify Key Parameters:
Use scatter plots to reveal which hyperparameters (e.g.,learning_rate) significantly influence performance.Explore Interactions:
Use heatmaps or contour plots to examine relationships between hyperparameters (e.g.,learning_ratevs.max_depth).Diagnose Issues:
Spot poor search regions or insufficient iterations through patterns in the plots.
Practical Tips
Plots:
Use convergence, scatter, heatmap, or parallel coordinate plots with libraries like Matplotlib or Seaborn.Accessing Results:
Extract tuning history from thecv_results_attribute of theBayesSearchCVobject for plotting.
Let’s plot the convergence first
from skopt.plots import plot_convergence, plot_objective
# plot the convergence
bayes_res = bayes_search.optimizer_results_[0]
plot_convergence(bayes_res)
plt.show()# the raw objective values at each iteration
func_vals = np.array(bayes_res.func_vals)
# the best objective value found so far
best_val = func_vals.min()
# All iterations (1-based) that hit that minimum
best_iters = np.where(func_vals == best_val)[0] + 1
print(f"Best objective = {best_val:.4f}")
print(f"Reached at iteration(s): {best_iters.tolist()}")Best objective = 3061.9465
Reached at iteration(s): [50]
func_valsis an array of lengthn_iter, one entry per call.We use
argmin/minbecause by default BayesSearchCV minimizes the underlying surrogate objective (here negative MAE).If you’re maximizing some metric, either negate func_vals or look at -best_val appropriately.
Let’s plot the objective next
# Plot and store the figure and axes
fig, ax = plt.subplots(figsize=(14, 8))
result = plot_objective(bayes_search.optimizer_results_[0], ax=ax)
# Find contour plots in all the axes
for i, ax in enumerate(fig.get_axes()):
# Look for collections in each axis
for collection in ax.collections:
if isinstance(collection, plt.matplotlib.collections.QuadMesh) or \
isinstance(collection, plt.matplotlib.collections.PathCollection):
# Add colorbar for this collection
cbar = fig.colorbar(collection, ax=ax)
cbar.set_label('Objective Value')
break
plt.tight_layout()
plt.show()Let’s plot evalutions next
from skopt.plots import plot_evaluations
plot_evaluations(bayes_search.optimizer_results_[0], bins=10)
plt.show()Let’s visualize parameter distributions next
# Get hyperparameter names from BayesSearchCV's search_spaces
results = bayes_search.cv_results_
best_params = bayes_search.best_params_
# Convert results to DataFrame
results_df = pd.DataFrame(results)
# Create plots for each parameter
fig, axes = plt.subplots(1, len(best_params), figsize=(15, 4))
params = list(best_params.keys())
for i, param in enumerate(params):
param_name = f'param_{param}'
# Extract parameter values
param_values = results_df[param_name].values
# Scatter plot: parameter value vs performance
axes[i].scatter(param_values, results_df['mean_test_score'])
axes[i].set_xlabel(param)
axes[i].set_ylabel('Mean Test Score')
axes[i].axvline(best_params[param], color='r', linestyle='--', label='Best value')
axes[i].legend()
plt.tight_layout()
plt.show()12.5.2.1 Next Steps for Hyperparameter Tuning
Refine Search Space
- Based on scatter plots, focus tuning on the most promising regions:
learning_rate: 0.01–0.1 (red stars clustered around 0.05)n_estimators: 50–200 (indicated by clustering and red stars)max_depth: 5–15 (majority of samples fall in this range)subsample: 0.6–0.9colsample_bytree: 0.6–0.9gamma: 0–2 (most samples are below 2)
- Narrowing the hyperparameter space reduces computational cost and concentrates the search on high-performing regions.
Increase Iterations
- The convergence plot suggests the search may have stabilized, but increasing the number of iterations (e.g.,
n_iter=75) within the refined space could help confirm the optimum.
%%time
# refine the search space for Bayesian optimization
search_space = {
'n_estimators': Integer(50, 500),
'max_depth': Integer(5, 15),
'learning_rate': Real(0.01, 0.1, prior='uniform'),
'subsample': Real(0.6, 0.9, prior='uniform'),
'colsample_bytree': Real(0.6, 0.9, prior='uniform'),
'gamma': Real(0, 2, prior='uniform'),
}
# define the model
xgb = XGBRegressor(objective='reg:squarederror', enable_categorical=True, random_state=42)
# define the Bayesian optimization search
bayes_search = BayesSearchCV(
xgb,
search_space,
n_iter=75,
scoring='neg_root_mean_squared_error',
cv=3,
n_jobs=-1,
random_state=42,
verbose=0
)
# fit the model
bayes_search.fit(X_train, y_train)CPU times: total: 1min 37s
Wall time: 2min 59s
BayesSearchCV(cv=3,
estimator=XGBRegressor(base_score=None, booster=None,
callbacks=None, colsample_bylevel=None,
colsample_bynode=None,
colsample_bytree=None, device=None,
early_stopping_rounds=None,
enable_categorical=True, eval_metric=None,
feature_types=None, feature_weights=None,
gamma=None, grow_policy=None,
importance_type=None,
interaction_constraints=None...
'gamma': Real(low=0, high=2, prior='uniform', transform='normalize'),
'learning_rate': Real(low=0.01, high=0.1, prior='uniform', transform='normalize'),
'max_depth': Integer(low=5, high=15, prior='uniform', transform='normalize'),
'n_estimators': Integer(low=50, high=500, prior='uniform', transform='normalize'),
'subsample': Real(low=0.6, high=0.9, prior='uniform', transform='normalize')})In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
BayesSearchCV(cv=3,
estimator=XGBRegressor(base_score=None, booster=None,
callbacks=None, colsample_bylevel=None,
colsample_bynode=None,
colsample_bytree=None, device=None,
early_stopping_rounds=None,
enable_categorical=True, eval_metric=None,
feature_types=None, feature_weights=None,
gamma=None, grow_policy=None,
importance_type=None,
interaction_constraints=None...
'gamma': Real(low=0, high=2, prior='uniform', transform='normalize'),
'learning_rate': Real(low=0.01, high=0.1, prior='uniform', transform='normalize'),
'max_depth': Integer(low=5, high=15, prior='uniform', transform='normalize'),
'n_estimators': Integer(low=50, high=500, prior='uniform', transform='normalize'),
'subsample': Real(low=0.6, high=0.9, prior='uniform', transform='normalize')})XGBRegressor(base_score=None, booster=None, callbacks=None,
colsample_bylevel=None, colsample_bynode=None,
colsample_bytree=0.60000489767951, device=None,
early_stopping_rounds=None, enable_categorical=True,
eval_metric=None, feature_types=None, feature_weights=None,
gamma=1.3194686223941245, grow_policy=None, importance_type=None,
interaction_constraints=None, learning_rate=0.024138854425252557,
max_bin=None, max_cat_threshold=None, max_cat_to_onehot=None,
max_delta_step=None, max_depth=6, max_leaves=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
multi_strategy=None, n_estimators=409, n_jobs=None,
num_parallel_tree=None, ...)XGBRegressor(base_score=None, booster=None, callbacks=None,
colsample_bylevel=None, colsample_bynode=None,
colsample_bytree=0.60000489767951, device=None,
early_stopping_rounds=None, enable_categorical=True,
eval_metric=None, feature_types=None, feature_weights=None,
gamma=1.3194686223941245, grow_policy=None, importance_type=None,
interaction_constraints=None, learning_rate=0.024138854425252557,
max_bin=None, max_cat_threshold=None, max_cat_to_onehot=None,
max_delta_step=None, max_depth=6, max_leaves=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
multi_strategy=None, n_estimators=409, n_jobs=None,
num_parallel_tree=None, ...)print('Bayesian Optimization Results: ')
# get the best parameters
best_params_bayes = bayes_search.best_params_
print('Best Parameters: ')
print(best_params_bayes)
# get the best score
best_score_bayes = bayes_search.best_score_
print('Best CV Score: ', best_score_bayes)
# make predictions
y_pred_bayes = bayes_search.predict(X_test)
# calculate the RMSE
rmse_bayes = root_mean_squared_error(y_test, y_pred_bayes)
print('Test RMSE: ', rmse_bayes)
# calculate the R2 score
r2_bayes = r2_score(y_test, y_pred_bayes)
print('Test R2: ', r2_bayes)Bayesian Optimization Results:
Best Parameters:
OrderedDict({'colsample_bytree': 0.60000489767951, 'gamma': 1.3194686223941245, 'learning_rate': 0.024138854425252557, 'max_depth': 6, 'n_estimators': 409, 'subsample': 0.6717795261110148})
Best CV Score: -2997.3878580729165
Test RMSE: 3131.67626953125
Test R2: 0.9665706753730774
As can be seen, the test RMSE was further reduced after incorporating these steps.
# plot objective function
bayes_res = bayes_search.optimizer_results_[0]
plot_convergence(bayes_res)
plt.show()With 75 iterations, the search appears sufficient, as the objective value stabilizes
Alternatively, consider switching to Optuna, which offers advanced features like pruning, allowing it to stop unpromising trials early and focus computational effort on more promising regions of the search space.
Optuna is often the preferred choice for tuning tree-based models such as XGBoost, LightGBM, and CatBoost, thanks to its intelligent Tree-structured Parzen Estimator (TPE) search algorithm and efficient pruning strategy.
12.5.3 2. Optuna (Advanced TPE-Based Optimization)
Optuna is a modern hyperparameter optimization framework that uses Tree-structured Parzen Estimator (TPE) and pruning to efficiently explore the hyperparameter space.
How it works: Models the hyperparameter space probabilistically and prioritizes promising combinations, stopping unpromising trials early (pruning).
Pros:
- Highly efficient due to pruning, saving compute time for expensive models like boosting trees.
- Flexible, supporting dynamic search spaces and easy integration with XGBoost, LightGBM, and CatBoost.
- Often finds better hyperparameters with fewer trials compared to
RandomizedSearchCV.
Cons:
- Requires slightly more setup (e.g., defining an objective function) than scikit-learn tools.
Library:
optuna.Use Case: Best for most boosting model tasks, especially with large datasets or complex hyperparameter spaces.
%%time
# use optuna
import optuna
from sklearn.model_selection import cross_val_score
from optuna import create_study
def objective(trial):
# Define the hyperparameters to tune
n_estimators = trial.suggest_int('n_estimators', 50, 500)
max_depth = trial.suggest_int('max_depth', 5, 30)
learning_rate = trial.suggest_float('learning_rate', 0.01, 0.3)
subsample = trial.suggest_float('subsample', 0.5, 1.0)
colsample_bytree = trial.suggest_float('colsample_bytree', 0.5, 1.0)
gamma = trial.suggest_float('gamma', 0, 5)
# Create the model
model = XGBRegressor(
n_estimators=n_estimators,
max_depth=max_depth,
learning_rate=learning_rate,
subsample=subsample,
colsample_bytree=colsample_bytree,
gamma=gamma,
objective='reg:squarederror',
enable_categorical=True,
random_state=42
)
# Perform cross-validation
scores = cross_val_score(model, X_train, y_train, cv=3, scoring='neg_root_mean_squared_error')
return -scores.mean()
# Create a study object
study = create_study(direction='minimize')
# Optimize the objective function
study.optimize(objective, n_trials=50)[I 2025-05-18 14:07:55,357] A new study created in memory with name: no-name-f3ce2686-4303-4134-ba07-01df44899c17
[I 2025-05-18 14:07:57,058] Trial 0 finished with value: 3331.6385904947915 and parameters: {'n_estimators': 110, 'max_depth': 13, 'learning_rate': 0.18884203283886744, 'subsample': 0.5854526666440897, 'colsample_bytree': 0.8937738801429804, 'gamma': 2.6946659405421385}. Best is trial 0 with value: 3331.6385904947915.
[I 2025-05-18 14:08:07,356] Trial 1 finished with value: 3532.982177734375 and parameters: {'n_estimators': 405, 'max_depth': 24, 'learning_rate': 0.16910832412869004, 'subsample': 0.9737607498458992, 'colsample_bytree': 0.5976945917277403, 'gamma': 2.817167693022663}. Best is trial 0 with value: 3331.6385904947915.
[I 2025-05-18 14:08:14,607] Trial 2 finished with value: 3277.841552734375 and parameters: {'n_estimators': 149, 'max_depth': 17, 'learning_rate': 0.04812097737053897, 'subsample': 0.9546726621457335, 'colsample_bytree': 0.8450595576069481, 'gamma': 2.3750014829760397}. Best is trial 2 with value: 3277.841552734375.
[I 2025-05-18 14:08:31,440] Trial 3 finished with value: 3288.1484375 and parameters: {'n_estimators': 239, 'max_depth': 28, 'learning_rate': 0.07547917129486585, 'subsample': 0.8139622919192127, 'colsample_bytree': 0.614390583668776, 'gamma': 4.977457905957452}. Best is trial 2 with value: 3277.841552734375.
[I 2025-05-18 14:08:33,724] Trial 4 finished with value: 3442.0733235677085 and parameters: {'n_estimators': 158, 'max_depth': 11, 'learning_rate': 0.04820580828068148, 'subsample': 0.9740363627954242, 'colsample_bytree': 0.5255147944629655, 'gamma': 3.2856067984172883}. Best is trial 2 with value: 3277.841552734375.
[I 2025-05-18 14:08:46,455] Trial 5 finished with value: 3532.2062174479165 and parameters: {'n_estimators': 427, 'max_depth': 18, 'learning_rate': 0.26452891035565, 'subsample': 0.5071558754145372, 'colsample_bytree': 0.9173489957689693, 'gamma': 4.150003935321765}. Best is trial 2 with value: 3277.841552734375.
[I 2025-05-18 14:09:01,942] Trial 6 finished with value: 3454.6267903645835 and parameters: {'n_estimators': 311, 'max_depth': 26, 'learning_rate': 0.1202287548017663, 'subsample': 0.8982432580618387, 'colsample_bytree': 0.561962457887061, 'gamma': 2.171604318548932}. Best is trial 2 with value: 3277.841552734375.
[I 2025-05-18 14:09:03,372] Trial 7 finished with value: 3168.041748046875 and parameters: {'n_estimators': 259, 'max_depth': 6, 'learning_rate': 0.15479906550009093, 'subsample': 0.722784330360674, 'colsample_bytree': 0.9717808152882963, 'gamma': 0.16914348890876785}. Best is trial 7 with value: 3168.041748046875.
[I 2025-05-18 14:09:13,799] Trial 8 finished with value: 3375.4777018229165 and parameters: {'n_estimators': 232, 'max_depth': 25, 'learning_rate': 0.24377703631670514, 'subsample': 0.7042201589884167, 'colsample_bytree': 0.7045843722953715, 'gamma': 3.672277794383117}. Best is trial 7 with value: 3168.041748046875.
[I 2025-05-18 14:09:14,891] Trial 9 finished with value: 3246.5836588541665 and parameters: {'n_estimators': 249, 'max_depth': 5, 'learning_rate': 0.24904510550714254, 'subsample': 0.5364967258658128, 'colsample_bytree': 0.870248309348203, 'gamma': 0.8871442748221514}. Best is trial 7 with value: 3168.041748046875.
[I 2025-05-18 14:09:16,406] Trial 10 finished with value: 3110.5743815104165 and parameters: {'n_estimators': 347, 'max_depth': 5, 'learning_rate': 0.12363016428276806, 'subsample': 0.6828890061277565, 'colsample_bytree': 0.9913288594327404, 'gamma': 0.028595940344607662}. Best is trial 10 with value: 3110.5743815104165.
[I 2025-05-18 14:09:17,960] Trial 11 finished with value: 3080.7557779947915 and parameters: {'n_estimators': 344, 'max_depth': 5, 'learning_rate': 0.11966202039748526, 'subsample': 0.6966666409771092, 'colsample_bytree': 0.9907568324552126, 'gamma': 0.03193020721640565}. Best is trial 11 with value: 3080.7557779947915.
[I 2025-05-18 14:09:22,688] Trial 12 finished with value: 3226.2923990885415 and parameters: {'n_estimators': 355, 'max_depth': 9, 'learning_rate': 0.11711488725389262, 'subsample': 0.631951758513077, 'colsample_bytree': 0.9983350666437509, 'gamma': 1.27168447617301}. Best is trial 11 with value: 3080.7557779947915.
[I 2025-05-18 14:09:27,330] Trial 13 finished with value: 3197.4087727864585 and parameters: {'n_estimators': 485, 'max_depth': 8, 'learning_rate': 0.10490346463062429, 'subsample': 0.8054330102331452, 'colsample_bytree': 0.7392833809068201, 'gamma': 0.0009708831636260976}. Best is trial 11 with value: 3080.7557779947915.
[I 2025-05-18 14:09:36,167] Trial 14 finished with value: 3265.5059407552085 and parameters: {'n_estimators': 350, 'max_depth': 14, 'learning_rate': 0.010974084771641884, 'subsample': 0.657436989440511, 'colsample_bytree': 0.9444260043103783, 'gamma': 1.1415074503215086}. Best is trial 11 with value: 3080.7557779947915.
[I 2025-05-18 14:09:38,021] Trial 15 finished with value: 3211.10498046875 and parameters: {'n_estimators': 414, 'max_depth': 5, 'learning_rate': 0.20276262694735223, 'subsample': 0.7858874877363067, 'colsample_bytree': 0.8156010945025101, 'gamma': 0.6895297725422393}. Best is trial 11 with value: 3080.7557779947915.
[I 2025-05-18 14:09:51,554] Trial 16 finished with value: 3234.952392578125 and parameters: {'n_estimators': 325, 'max_depth': 20, 'learning_rate': 0.14214931890640023, 'subsample': 0.6610397075583099, 'colsample_bytree': 0.7954031984128783, 'gamma': 1.5656159074060274}. Best is trial 11 with value: 3080.7557779947915.
[I 2025-05-18 14:09:57,644] Trial 17 finished with value: 3401.1375325520835 and parameters: {'n_estimators': 496, 'max_depth': 9, 'learning_rate': 0.29514668447615683, 'subsample': 0.8632373925552281, 'colsample_bytree': 0.9436381134524987, 'gamma': 0.5407976360085849}. Best is trial 11 with value: 3080.7557779947915.
[I 2025-05-18 14:10:05,745] Trial 18 finished with value: 3174.0157877604165 and parameters: {'n_estimators': 380, 'max_depth': 13, 'learning_rate': 0.09061151958846106, 'subsample': 0.5954891745420089, 'colsample_bytree': 0.710343809975879, 'gamma': 1.6534497188298685}. Best is trial 11 with value: 3080.7557779947915.
[I 2025-05-18 14:10:08,286] Trial 19 finished with value: 3356.6737467447915 and parameters: {'n_estimators': 292, 'max_depth': 8, 'learning_rate': 0.21357082994597948, 'subsample': 0.7407503163269148, 'colsample_bytree': 0.9833552054892948, 'gamma': 0.267820475624959}. Best is trial 11 with value: 3080.7557779947915.
[I 2025-05-18 14:10:22,814] Trial 20 finished with value: 3189.736572265625 and parameters: {'n_estimators': 450, 'max_depth': 16, 'learning_rate': 0.04617199108711706, 'subsample': 0.6865570474083165, 'colsample_bytree': 0.7772785972725732, 'gamma': 1.5897266883914987}. Best is trial 11 with value: 3080.7557779947915.
[I 2025-05-18 14:10:23,667] Trial 21 finished with value: 3140.2428385416665 and parameters: {'n_estimators': 184, 'max_depth': 5, 'learning_rate': 0.14599106624971991, 'subsample': 0.7301791371379666, 'colsample_bytree': 0.9701812221998805, 'gamma': 0.04016923066597301}. Best is trial 11 with value: 3080.7557779947915.
[I 2025-05-18 14:10:24,836] Trial 22 finished with value: 3120.2205403645835 and parameters: {'n_estimators': 198, 'max_depth': 6, 'learning_rate': 0.13628764120275988, 'subsample': 0.7215881017734654, 'colsample_bytree': 0.9106993240806615, 'gamma': 0.7032606459017747}. Best is trial 11 with value: 3080.7557779947915.
[I 2025-05-18 14:10:27,854] Trial 23 finished with value: 3221.032958984375 and parameters: {'n_estimators': 213, 'max_depth': 10, 'learning_rate': 0.12311335018092498, 'subsample': 0.7597429195477536, 'colsample_bytree': 0.9099463398246272, 'gamma': 0.6233266609610246}. Best is trial 11 with value: 3080.7557779947915.
[I 2025-05-18 14:10:28,615] Trial 24 finished with value: 3057.6537272135415 and parameters: {'n_estimators': 81, 'max_depth': 7, 'learning_rate': 0.078715831835235, 'subsample': 0.5951589403340591, 'colsample_bytree': 0.8553935335757352, 'gamma': 0.9594533088744169}. Best is trial 24 with value: 3057.6537272135415.
[I 2025-05-18 14:10:29,930] Trial 25 finished with value: 3171.862060546875 and parameters: {'n_estimators': 53, 'max_depth': 11, 'learning_rate': 0.08446318688885693, 'subsample': 0.605414568663358, 'colsample_bytree': 0.8600691427823842, 'gamma': 1.145614241815684}. Best is trial 24 with value: 3057.6537272135415.
[I 2025-05-18 14:10:30,422] Trial 26 finished with value: 3118.3546549479165 and parameters: {'n_estimators': 64, 'max_depth': 7, 'learning_rate': 0.1763446382221982, 'subsample': 0.5599604709276149, 'colsample_bytree': 0.8324660495637575, 'gamma': 2.024470151715155}. Best is trial 24 with value: 3057.6537272135415.
[I 2025-05-18 14:10:47,624] Trial 27 finished with value: 3235.0696614583335 and parameters: {'n_estimators': 351, 'max_depth': 21, 'learning_rate': 0.07197065204515055, 'subsample': 0.6467002339598596, 'colsample_bytree': 0.9487536672551595, 'gamma': 0.37977717359101093}. Best is trial 24 with value: 3057.6537272135415.
[I 2025-05-18 14:10:49,614] Trial 28 finished with value: 3097.6119791666665 and parameters: {'n_estimators': 285, 'max_depth': 7, 'learning_rate': 0.09471444392323464, 'subsample': 0.6817657768958778, 'colsample_bytree': 0.8759056819860906, 'gamma': 0.9317315963616888}. Best is trial 24 with value: 3057.6537272135415.
[I 2025-05-18 14:10:51,215] Trial 29 finished with value: 3725.931640625 and parameters: {'n_estimators': 97, 'max_depth': 12, 'learning_rate': 0.026960028516967147, 'subsample': 0.6139258759406401, 'colsample_bytree': 0.6628749649677554, 'gamma': 0.9646011976562785}. Best is trial 24 with value: 3057.6537272135415.
[I 2025-05-18 14:10:59,064] Trial 30 finished with value: 3169.0587565104165 and parameters: {'n_estimators': 285, 'max_depth': 15, 'learning_rate': 0.09896665826089661, 'subsample': 0.5620050505228791, 'colsample_bytree': 0.8841366488845119, 'gamma': 1.736075725816671}. Best is trial 24 with value: 3057.6537272135415.
[I 2025-05-18 14:11:01,242] Trial 31 finished with value: 3126.4650065104165 and parameters: {'n_estimators': 317, 'max_depth': 7, 'learning_rate': 0.0680504082792042, 'subsample': 0.681613021416451, 'colsample_bytree': 0.8925460900423244, 'gamma': 0.3739014472357273}. Best is trial 24 with value: 3057.6537272135415.
[I 2025-05-18 14:11:03,903] Trial 32 finished with value: 3147.3534342447915 and parameters: {'n_estimators': 378, 'max_depth': 7, 'learning_rate': 0.1065619328138134, 'subsample': 0.6315716106994695, 'colsample_bytree': 0.9388945875286986, 'gamma': 2.6908234991554925}. Best is trial 24 with value: 3057.6537272135415.
[I 2025-05-18 14:11:07,608] Trial 33 finished with value: 3277.1427408854165 and parameters: {'n_estimators': 274, 'max_depth': 10, 'learning_rate': 0.16713465741205957, 'subsample': 0.683331861347265, 'colsample_bytree': 0.9966072329286155, 'gamma': 1.3938586268372646}. Best is trial 24 with value: 3057.6537272135415.
[I 2025-05-18 14:11:09,751] Trial 34 finished with value: 3032.910888671875 and parameters: {'n_estimators': 381, 'max_depth': 5, 'learning_rate': 0.06252869239380691, 'subsample': 0.7690084081861999, 'colsample_bytree': 0.8409337377788446, 'gamma': 0.9336069874026807}. Best is trial 34 with value: 3032.910888671875.
[I 2025-05-18 14:11:13,047] Trial 35 finished with value: 3155.891357421875 and parameters: {'n_estimators': 390, 'max_depth': 8, 'learning_rate': 0.05772737980917477, 'subsample': 0.7727672665104645, 'colsample_bytree': 0.8382329436363554, 'gamma': 1.911793066576803}. Best is trial 34 with value: 3032.910888671875.
[I 2025-05-18 14:11:16,438] Trial 36 finished with value: 3163.3846028645835 and parameters: {'n_estimators': 138, 'max_depth': 12, 'learning_rate': 0.03124181345416563, 'subsample': 0.8286701237393919, 'colsample_bytree': 0.7653391398221429, 'gamma': 0.8812646755072657}. Best is trial 34 with value: 3032.910888671875.
[I 2025-05-18 14:11:41,945] Trial 37 finished with value: 3288.4474283854165 and parameters: {'n_estimators': 438, 'max_depth': 30, 'learning_rate': 0.07977446319685096, 'subsample': 0.9292180638154401, 'colsample_bytree': 0.8089212564431525, 'gamma': 3.040405215443998}. Best is trial 34 with value: 3032.910888671875.
[I 2025-05-18 14:11:45,381] Trial 38 finished with value: 3190.393798828125 and parameters: {'n_estimators': 325, 'max_depth': 9, 'learning_rate': 0.0607949320144306, 'subsample': 0.8368624109449146, 'colsample_bytree': 0.8622021741277767, 'gamma': 2.3450412092952115}. Best is trial 34 with value: 3032.910888671875.
[I 2025-05-18 14:11:47,866] Trial 39 finished with value: 3047.644775390625 and parameters: {'n_estimators': 455, 'max_depth': 6, 'learning_rate': 0.03783091966336392, 'subsample': 0.5685480281791342, 'colsample_bytree': 0.8410930211661991, 'gamma': 0.9928385438344886}. Best is trial 34 with value: 3032.910888671875.
[I 2025-05-18 14:11:50,005] Trial 40 finished with value: 2999.4031575520835 and parameters: {'n_estimators': 401, 'max_depth': 6, 'learning_rate': 0.03991992047485554, 'subsample': 0.5117342764759609, 'colsample_bytree': 0.7318093708244227, 'gamma': 4.35126402777801}. Best is trial 40 with value: 2999.4031575520835.
[I 2025-05-18 14:11:53,128] Trial 41 finished with value: 3000.5756022135415 and parameters: {'n_estimators': 468, 'max_depth': 6, 'learning_rate': 0.03638902988416309, 'subsample': 0.511431478578078, 'colsample_bytree': 0.7383564756305684, 'gamma': 4.796159164563539}. Best is trial 40 with value: 2999.4031575520835.
[I 2025-05-18 14:11:55,478] Trial 42 finished with value: 2991.45703125 and parameters: {'n_estimators': 451, 'max_depth': 6, 'learning_rate': 0.03254778860582558, 'subsample': 0.5230436555378349, 'colsample_bytree': 0.7356860621113286, 'gamma': 4.747158547696816}. Best is trial 42 with value: 2991.45703125.
[I 2025-05-18 14:11:57,923] Trial 43 finished with value: 2990.7191569010415 and parameters: {'n_estimators': 466, 'max_depth': 6, 'learning_rate': 0.03084135866822044, 'subsample': 0.502564756808311, 'colsample_bytree': 0.6563931619463742, 'gamma': 4.887027549415513}. Best is trial 43 with value: 2990.7191569010415.
[I 2025-05-18 14:12:03,726] Trial 44 finished with value: 3033.5712076822915 and parameters: {'n_estimators': 471, 'max_depth': 10, 'learning_rate': 0.01468098928659789, 'subsample': 0.5008951225164894, 'colsample_bytree': 0.6756587019419354, 'gamma': 4.962494330375168}. Best is trial 43 with value: 2990.7191569010415.
[I 2025-05-18 14:12:05,864] Trial 45 finished with value: 2999.9879557291665 and parameters: {'n_estimators': 417, 'max_depth': 6, 'learning_rate': 0.021606661037487623, 'subsample': 0.5288333074288567, 'colsample_bytree': 0.6186918608892495, 'gamma': 4.619632551925978}. Best is trial 43 with value: 2990.7191569010415.
[I 2025-05-18 14:12:09,064] Trial 46 finished with value: 3025.92626953125 and parameters: {'n_estimators': 409, 'max_depth': 8, 'learning_rate': 0.023126790924218966, 'subsample': 0.5345419189292403, 'colsample_bytree': 0.6008910344234353, 'gamma': 4.6203027358091635}. Best is trial 43 with value: 2990.7191569010415.
[I 2025-05-18 14:12:27,174] Trial 47 finished with value: 3161.40185546875 and parameters: {'n_estimators': 477, 'max_depth': 23, 'learning_rate': 0.04131664641254084, 'subsample': 0.5375997658247437, 'colsample_bytree': 0.6325464591765605, 'gamma': 4.045095333144472}. Best is trial 43 with value: 2990.7191569010415.
[I 2025-05-18 14:12:29,501] Trial 48 finished with value: 3104.0904134114585 and parameters: {'n_estimators': 459, 'max_depth': 6, 'learning_rate': 0.05162428908533369, 'subsample': 0.5209638236554095, 'colsample_bytree': 0.5499729255374505, 'gamma': 4.424201509909468}. Best is trial 43 with value: 2990.7191569010415.
[I 2025-05-18 14:12:34,440] Trial 49 finished with value: 3205.3310546875 and parameters: {'n_estimators': 424, 'max_depth': 9, 'learning_rate': 0.01062489230890902, 'subsample': 0.999521469464556, 'colsample_bytree': 0.732915260237857, 'gamma': 3.7623763401087156}. Best is trial 43 with value: 2990.7191569010415.
CPU times: total: 25min 51s
Wall time: 4min 39s
print('Optuna Optimization Results: ')
# get the best parameters
best_params_optuna = study.best_params
print('Best Parameters: ')
print(best_params_optuna)
# get the best score
best_score_optuna = study.best_value
print('Best CV Score: ', best_score_optuna)Optuna Optimization Results:
Best Parameters:
{'n_estimators': 466, 'max_depth': 6, 'learning_rate': 0.03084135866822044, 'subsample': 0.502564756808311, 'colsample_bytree': 0.6563931619463742, 'gamma': 4.887027549415513}
Best CV Score: 2990.7191569010415
# retrain the model with the best parameters
best_model = XGBRegressor(
**best_params_optuna,
objective='reg:squarederror',
enable_categorical=True,
random_state=42
)
# fit the model
best_model.fit(X_train, y_train)
# make predictions
y_pred_optuna = best_model.predict(X_test)
# calculate the RMSE
rmse_optuna = root_mean_squared_error(y_test, y_pred_optuna)
print('Test RMSE: ', rmse_optuna)
# calculate the R2 score
r2_optuna = r2_score(y_test, y_pred_optuna)
print('Test R2: ', r2_optuna)Test RMSE: 3096.413818359375
Test R2: 0.9673192501068115
Let’s visualize the tuning
import optuna.visualization as vis
import plotly.io as pio
# Generate figures
fig1 = vis.plot_optimization_history(study)
fig1.show()
fig2 = vis.plot_param_importances(study)
fig2.show()
fig3 = vis.plot_slice(study)
fig3.show()
Unable to display output for mime type(s): application/vnd.plotly.v1+json
Unable to display output for mime type(s): application/vnd.plotly.v1+json
Unable to display output for mime type(s): application/vnd.plotly.v1+json
Enable Optuna parallel trials with n_jobs=-1
%%time
def objective(trial):
# Define the hyperparameters to tune
n_estimators = trial.suggest_int('n_estimators', 50, 500)
max_depth = trial.suggest_int('max_depth', 5, 30)
learning_rate = trial.suggest_float('learning_rate', 0.01, 0.3)
subsample = trial.suggest_float('subsample', 0.5, 1.0)
colsample_bytree = trial.suggest_float('colsample_bytree', 0.5, 1.0)
gamma = trial.suggest_float('gamma', 0, 5)
# Create the model
model = XGBRegressor(
n_estimators=n_estimators,
max_depth=max_depth,
learning_rate=learning_rate,
subsample=subsample,
colsample_bytree=colsample_bytree,
gamma=gamma,
objective='reg:squarederror',
enable_categorical=True,
random_state=42
)
# Perform cross-validation
scores = cross_val_score(model, X_train, y_train, cv=3, scoring='neg_root_mean_squared_error')
return -scores.mean()
parallel_study = create_study(direction='minimize')
parallel_study.optimize(objective, n_trials=50, n_jobs=-1)[I 2025-05-18 14:34:24,598] A new study created in memory with name: no-name-8c4110c7-8504-4b30-8239-4ca7a887a77e
[I 2025-05-18 14:34:44,688] Trial 3 finished with value: 3386.8870442708335 and parameters: {'n_estimators': 186, 'max_depth': 7, 'learning_rate': 0.016917121150508863, 'subsample': 0.6205391387487309, 'colsample_bytree': 0.9226243614668972, 'gamma': 0.7198073118252574}. Best is trial 3 with value: 3386.8870442708335.
[I 2025-05-18 14:34:51,587] Trial 15 finished with value: 3225.1217447916665 and parameters: {'n_estimators': 336, 'max_depth': 5, 'learning_rate': 0.22071165224743958, 'subsample': 0.9479603262439835, 'colsample_bytree': 0.633493959052336, 'gamma': 4.219844381569382}. Best is trial 15 with value: 3225.1217447916665.
[I 2025-05-18 14:34:56,685] Trial 0 finished with value: 3780.4537760416665 and parameters: {'n_estimators': 75, 'max_depth': 28, 'learning_rate': 0.08734595383748141, 'subsample': 0.643816201392093, 'colsample_bytree': 0.5096005859401787, 'gamma': 2.346236402110034}. Best is trial 15 with value: 3225.1217447916665.
[I 2025-05-18 14:35:00,827] Trial 1 finished with value: 3492.1652018229165 and parameters: {'n_estimators': 89, 'max_depth': 22, 'learning_rate': 0.26936321457491236, 'subsample': 0.9260739983960038, 'colsample_bytree': 0.7892145514084559, 'gamma': 0.03440365361497333}. Best is trial 15 with value: 3225.1217447916665.
[I 2025-05-18 14:35:04,073] Trial 13 finished with value: 3158.33447265625 and parameters: {'n_estimators': 283, 'max_depth': 9, 'learning_rate': 0.0696919546852828, 'subsample': 0.5001580826879467, 'colsample_bytree': 0.8923178791715032, 'gamma': 4.714268285930873}. Best is trial 13 with value: 3158.33447265625.
[I 2025-05-18 14:35:04,526] Trial 17 finished with value: 3402.6753743489585 and parameters: {'n_estimators': 61, 'max_depth': 13, 'learning_rate': 0.22097576760857726, 'subsample': 0.5157400674693156, 'colsample_bytree': 0.9153999202038743, 'gamma': 3.8810641950136353}. Best is trial 13 with value: 3158.33447265625.
[I 2025-05-18 14:35:09,821] Trial 10 finished with value: 3173.780517578125 and parameters: {'n_estimators': 115, 'max_depth': 20, 'learning_rate': 0.05718427703256109, 'subsample': 0.7951084114071031, 'colsample_bytree': 0.6691682768529876, 'gamma': 0.9524397379855204}. Best is trial 13 with value: 3158.33447265625.
[I 2025-05-18 14:35:12,310] Trial 20 finished with value: 3132.5121256510415 and parameters: {'n_estimators': 67, 'max_depth': 7, 'learning_rate': 0.18097723795904094, 'subsample': 0.6565256472443639, 'colsample_bytree': 0.8492487356738951, 'gamma': 3.094448539748332}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:35:27,118] Trial 5 finished with value: 3247.8447265625 and parameters: {'n_estimators': 185, 'max_depth': 18, 'learning_rate': 0.12450637517625557, 'subsample': 0.6530395458833994, 'colsample_bytree': 0.9123640605975669, 'gamma': 4.153197251161178}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:35:32,054] Trial 2 finished with value: 3298.5411783854165 and parameters: {'n_estimators': 198, 'max_depth': 19, 'learning_rate': 0.1625956401464969, 'subsample': 0.6454388243734961, 'colsample_bytree': 0.6773968413189418, 'gamma': 0.6809545266278161}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:35:38,519] Trial 19 finished with value: 3206.009765625 and parameters: {'n_estimators': 389, 'max_depth': 6, 'learning_rate': 0.16121536164972963, 'subsample': 0.5021218552699114, 'colsample_bytree': 0.5687043089885795, 'gamma': 1.9151520851034594}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:35:43,501] Trial 8 finished with value: 3190.1316731770835 and parameters: {'n_estimators': 185, 'max_depth': 24, 'learning_rate': 0.04861252706056661, 'subsample': 0.7514014541546887, 'colsample_bytree': 0.5947087728251452, 'gamma': 2.347498071325193}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:35:52,862] Trial 24 finished with value: 3393.3465169270835 and parameters: {'n_estimators': 268, 'max_depth': 6, 'learning_rate': 0.26658647968026516, 'subsample': 0.9687611646582986, 'colsample_bytree': 0.6473978497637801, 'gamma': 2.112593418725748}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:35:54,416] Trial 9 finished with value: 3281.0540364583335 and parameters: {'n_estimators': 267, 'max_depth': 18, 'learning_rate': 0.09626803123217077, 'subsample': 0.911374738293637, 'colsample_bytree': 0.8773073634628092, 'gamma': 3.481941940869614}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:35:57,296] Trial 4 finished with value: 3236.0377604166665 and parameters: {'n_estimators': 355, 'max_depth': 15, 'learning_rate': 0.10929163313929664, 'subsample': 0.7658160498873094, 'colsample_bytree': 0.7163396553124712, 'gamma': 2.652745236329293}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:35:58,600] Trial 16 finished with value: 3217.8741861979165 and parameters: {'n_estimators': 350, 'max_depth': 12, 'learning_rate': 0.02298579332922481, 'subsample': 0.9869225199063424, 'colsample_bytree': 0.7703334112529645, 'gamma': 0.9673314117244886}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:36:02,363] Trial 12 finished with value: 3801.4449869791665 and parameters: {'n_estimators': 323, 'max_depth': 18, 'learning_rate': 0.2970802052171264, 'subsample': 0.6557456264021807, 'colsample_bytree': 0.6426755505446563, 'gamma': 3.296566349997323}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:36:04,137] Trial 23 finished with value: 3145.9025065104165 and parameters: {'n_estimators': 149, 'max_depth': 20, 'learning_rate': 0.11627192901824128, 'subsample': 0.5531936628837026, 'colsample_bytree': 0.8862514403683301, 'gamma': 2.8135869506363886}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:36:04,702] Trial 21 finished with value: 3610.6133626302085 and parameters: {'n_estimators': 160, 'max_depth': 22, 'learning_rate': 0.2999769958923947, 'subsample': 0.5067125540912617, 'colsample_bytree': 0.8981157580104169, 'gamma': 4.806320208954713}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:36:06,371] Trial 14 finished with value: 3340.8546549479165 and parameters: {'n_estimators': 245, 'max_depth': 25, 'learning_rate': 0.17652467395345062, 'subsample': 0.611813588138444, 'colsample_bytree': 0.7342047981839466, 'gamma': 2.307485804181127}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:36:18,414] Trial 6 finished with value: 3459.354248046875 and parameters: {'n_estimators': 292, 'max_depth': 27, 'learning_rate': 0.14966590769182178, 'subsample': 0.8644768034280883, 'colsample_bytree': 0.585581150985163, 'gamma': 4.2857336056873905}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:36:19,888] Trial 18 finished with value: 3383.8411458333335 and parameters: {'n_estimators': 198, 'max_depth': 26, 'learning_rate': 0.21556692279966222, 'subsample': 0.5771557052612845, 'colsample_bytree': 0.9826376798012539, 'gamma': 4.5858180068589025}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:36:32,659] Trial 36 finished with value: 3416.326171875 and parameters: {'n_estimators': 137, 'max_depth': 9, 'learning_rate': 0.19558438716354623, 'subsample': 0.5673630855731072, 'colsample_bytree': 0.9859366413577002, 'gamma': 3.023613531245067}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:36:36,836] Trial 7 finished with value: 3498.0222981770835 and parameters: {'n_estimators': 410, 'max_depth': 23, 'learning_rate': 0.20412014730717506, 'subsample': 0.7787220663675116, 'colsample_bytree': 0.6104047744776355, 'gamma': 3.7782931152361905}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:36:45,340] Trial 22 finished with value: 3173.2679036458335 and parameters: {'n_estimators': 342, 'max_depth': 18, 'learning_rate': 0.0918444887804519, 'subsample': 0.6358151816674005, 'colsample_bytree': 0.8178206059616683, 'gamma': 2.3187025698274963}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:36:53,938] Trial 25 finished with value: 3260.6758626302085 and parameters: {'n_estimators': 474, 'max_depth': 13, 'learning_rate': 0.17976496530280672, 'subsample': 0.7974254896628823, 'colsample_bytree': 0.8171566934590944, 'gamma': 2.782580731869066}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:36:56,530] Trial 26 finished with value: 3207.96533203125 and parameters: {'n_estimators': 492, 'max_depth': 12, 'learning_rate': 0.11220378588280396, 'subsample': 0.7741909801743497, 'colsample_bytree': 0.7986588796025045, 'gamma': 3.1947401986428576}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:36:56,884] Trial 31 finished with value: 3403.3111165364585 and parameters: {'n_estimators': 482, 'max_depth': 10, 'learning_rate': 0.19587655584318253, 'subsample': 0.5695806262499338, 'colsample_bytree': 0.9939250598824909, 'gamma': 4.678107426427386}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:36:59,028] Trial 32 finished with value: 3394.078369140625 and parameters: {'n_estimators': 475, 'max_depth': 10, 'learning_rate': 0.18692130864394926, 'subsample': 0.5639172117854069, 'colsample_bytree': 0.9846996142831652, 'gamma': 4.703621516978366}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:37:00,622] Trial 27 finished with value: 3218.7230631510415 and parameters: {'n_estimators': 491, 'max_depth': 12, 'learning_rate': 0.11995416383753857, 'subsample': 0.5466551591729691, 'colsample_bytree': 0.8084353830772836, 'gamma': 4.925162229632911}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:37:05,596] Trial 30 finished with value: 3401.9541829427085 and parameters: {'n_estimators': 500, 'max_depth': 11, 'learning_rate': 0.2152643048780489, 'subsample': 0.5757354436323991, 'colsample_bytree': 0.811733236112631, 'gamma': 4.912356352479171}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:37:07,743] Trial 11 finished with value: 3266.311279296875 and parameters: {'n_estimators': 455, 'max_depth': 29, 'learning_rate': 0.13606170670586942, 'subsample': 0.6994896811489387, 'colsample_bytree': 0.7984754896325321, 'gamma': 4.09626197911706}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:37:07,953] Trial 28 finished with value: 3182.3749186197915 and parameters: {'n_estimators': 488, 'max_depth': 12, 'learning_rate': 0.12058008708744347, 'subsample': 0.558159655388937, 'colsample_bytree': 0.803756895143798, 'gamma': 3.3020110039626416}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:37:08,665] Trial 29 finished with value: 3314.0919596354165 and parameters: {'n_estimators': 488, 'max_depth': 12, 'learning_rate': 0.19983958431862286, 'subsample': 0.5547949555803708, 'colsample_bytree': 0.8342550435077614, 'gamma': 4.941482674825019}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:37:12,068] Trial 37 finished with value: 3134.2323404947915 and parameters: {'n_estimators': 483, 'max_depth': 9, 'learning_rate': 0.07480607389730205, 'subsample': 0.5676604688073226, 'colsample_bytree': 0.8150289462186665, 'gamma': 2.8866777653704037}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:37:21,475] Trial 44 finished with value: 3191.2845052083335 and parameters: {'n_estimators': 93, 'max_depth': 15, 'learning_rate': 0.06527491497109726, 'subsample': 0.711964328195734, 'colsample_bytree': 0.859090805004589, 'gamma': 1.639238662369149}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:37:22,158] Trial 45 finished with value: 3185.5326334635415 and parameters: {'n_estimators': 91, 'max_depth': 15, 'learning_rate': 0.06735780487715705, 'subsample': 0.7015638902916604, 'colsample_bytree': 0.8561282967129621, 'gamma': 1.7180282992248004}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:37:25,746] Trial 41 finished with value: 3219.3627115885415 and parameters: {'n_estimators': 236, 'max_depth': 10, 'learning_rate': 0.12794602478115893, 'subsample': 0.6954209090620064, 'colsample_bytree': 0.8670840861317897, 'gamma': 1.5273956878386659}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:37:27,160] Trial 49 finished with value: 3200.2061360677085 and parameters: {'n_estimators': 80, 'max_depth': 16, 'learning_rate': 0.07822655273812698, 'subsample': 0.7097404244580205, 'colsample_bytree': 0.855803024035887, 'gamma': 1.7360340110549568}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:37:28,636] Trial 47 finished with value: 3183.0796712239585 and parameters: {'n_estimators': 101, 'max_depth': 15, 'learning_rate': 0.07140184583521393, 'subsample': 0.710670554031622, 'colsample_bytree': 0.8547486119581857, 'gamma': 1.7173865194695117}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:37:29,253] Trial 38 finished with value: 3208.407958984375 and parameters: {'n_estimators': 465, 'max_depth': 10, 'learning_rate': 0.06658819756064548, 'subsample': 0.6995360789704701, 'colsample_bytree': 0.8341518409466513, 'gamma': 2.87278237478264}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:37:29,453] Trial 35 finished with value: 3316.11328125 and parameters: {'n_estimators': 443, 'max_depth': 15, 'learning_rate': 0.19403052858659814, 'subsample': 0.7068750385474248, 'colsample_bytree': 0.9925412520466177, 'gamma': 2.9755182321525635}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:37:31,653] Trial 48 finished with value: 3209.8841959635415 and parameters: {'n_estimators': 87, 'max_depth': 20, 'learning_rate': 0.07578210673049532, 'subsample': 0.7003441215271053, 'colsample_bytree': 0.8546647246781772, 'gamma': 1.7208102871319328}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:37:32,887] Trial 39 finished with value: 3200.1593424479165 and parameters: {'n_estimators': 492, 'max_depth': 10, 'learning_rate': 0.065333145587516, 'subsample': 0.6966584422268385, 'colsample_bytree': 0.8172215987900011, 'gamma': 1.6552485151252307}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:37:35,803] Trial 40 finished with value: 3249.5609537760415 and parameters: {'n_estimators': 475, 'max_depth': 10, 'learning_rate': 0.13465422545350136, 'subsample': 0.6978833200280267, 'colsample_bytree': 0.8370747967719179, 'gamma': 4.968542636826679}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:37:37,436] Trial 43 finished with value: 3238.1642252604165 and parameters: {'n_estimators': 234, 'max_depth': 15, 'learning_rate': 0.1275208269474538, 'subsample': 0.7058773643147542, 'colsample_bytree': 0.8551947162378626, 'gamma': 1.6458760605079936}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:37:40,436] Trial 46 finished with value: 3206.9140625 and parameters: {'n_estimators': 233, 'max_depth': 15, 'learning_rate': 0.06896196005667433, 'subsample': 0.6997340322682682, 'colsample_bytree': 0.8647504256146414, 'gamma': 1.4300308921007199}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:37:51,343] Trial 42 finished with value: 3237.631591796875 and parameters: {'n_estimators': 228, 'max_depth': 30, 'learning_rate': 0.13632890468699124, 'subsample': 0.7024617677120901, 'colsample_bytree': 0.8546145920857883, 'gamma': 1.7651829526887861}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:37:51,445] Trial 34 finished with value: 3386.8898111979165 and parameters: {'n_estimators': 492, 'max_depth': 30, 'learning_rate': 0.19771915593325928, 'subsample': 0.5711165410120198, 'colsample_bytree': 0.9985500871271142, 'gamma': 3.052090619977873}. Best is trial 20 with value: 3132.5121256510415.
[I 2025-05-18 14:37:51,670] Trial 33 finished with value: 3355.239990234375 and parameters: {'n_estimators': 495, 'max_depth': 30, 'learning_rate': 0.194602408723096, 'subsample': 0.5698582471685457, 'colsample_bytree': 0.9947157446155714, 'gamma': 3.1410187252506825}. Best is trial 20 with value: 3132.5121256510415.
CPU times: total: 26min 6s
Wall time: 3min 27s
The wall time was shortened from 4 minutes 39 seconds to 3 minutes 27 seconds, a reduction of 1 minute 12 seconds for this small dataset.
# retrain the model with the best parameters
best_model = XGBRegressor(
**best_params_optuna,
objective='reg:squarederror',
enable_categorical=True,
random_state=42
)
# fit the model
best_model.fit(X_train, y_train)
# make predictions
y_pred_optuna = best_model.predict(X_test)
# calculate the RMSE
rmse_optuna = root_mean_squared_error(y_test, y_pred_optuna)
print('Test RMSE: ', rmse_optuna)
# calculate the R2 score
r2_optuna = r2_score(y_test, y_pred_optuna)
print('Test R2: ', r2_optuna)Test RMSE: 3096.413818359375
Test R2: 0.9673192501068115
The result is the same
Optuna supports multiple optimization strategies, with TPE as the default. You can customize its optimization behavior by selecting different samplers. The default settings are robust and adaptive, so for most practical use cases, tuning Optuna’s internal configuration is not necessary.
12.6 AutoML (Fast and Lightweight AutoML)
FLAML is a lightweight and efficient AutoML library developed by Microsoft Research. It’s designed for fast and economical hyperparameter optimization and model selection without relying on expensive Bayesian optimization.
you need to install FLAML
pip install flamlpip install flamlCollecting flaml
Downloading FLAML-2.3.4-py3-none-any.whl.metadata (16 kB)
Requirement already satisfied: NumPy>=1.17 in c:\users\lsi8012\appdata\local\anaconda3\lib\site-packages (from flaml) (1.26.4)
Downloading FLAML-2.3.4-py3-none-any.whl (314 kB)
---------------------------------------- 0.0/314.2 kB ? eta -:--:--
- -------------------------------------- 10.2/314.2 kB ? eta -:--:--
-------------------- ------------------- 163.8/314.2 kB 2.5 MB/s eta 0:00:01
--------------------------------------- 307.2/314.2 kB 3.8 MB/s eta 0:00:01
---------------------------------------- 314.2/314.2 kB 2.8 MB/s eta 0:00:00
Installing collected packages: flaml
Successfully installed flaml-2.3.4
Note: you may need to restart the kernel to use updated packages.
# setup flaml
from flaml import AutoML
settings = {
"time_budget": 240, # in seconds
"metric": 'rmse',
"task": 'regression',
"log_file_name": 'flaml.log',
}
automl = AutoML()
automl.fit(X_train, y_train, **settings)
# make predictions
y_pred_flaml = automl.predict(X_test)
# calculate the RMSE
rmse_flaml = root_mean_squared_error(y_test, y_pred_flaml)
print('Test RMSE: ', rmse_flaml)
# calculate the R2 score
r2_flaml = r2_score(y_test, y_pred_flaml)
print('Test R2: ', r2_flaml)
# get the best model
best_model_flaml = automl.model.estimator
# get the best parameters
best_params_flaml = automl.best_config
print('Best Parameters: ')
print(best_params_flaml)
# get the best score
best_score_flaml = automl.best_loss
print('Best CV Score: ', best_score_flaml)
# get the best estimator[flaml.automl.logger: 05-18 15:00:23] {1728} INFO - task = regression
[flaml.automl.logger: 05-18 15:00:23] {1739} INFO - Evaluation method: cv
[flaml.automl.logger: 05-18 15:00:23] {1838} INFO - Minimizing error metric: rmse
[flaml.automl.logger: 05-18 15:00:23] {1955} INFO - List of ML learners in AutoML Run: ['lgbm', 'rf', 'xgboost', 'extra_tree', 'xgb_limitdepth', 'sgd', 'catboost']
[flaml.automl.logger: 05-18 15:00:23] {2258} INFO - iteration 0, current learner lgbm
[flaml.automl.logger: 05-18 15:00:23] {2393} INFO - Estimated sufficient time budget=1953s. Estimated necessary time budget=17s.
[flaml.automl.logger: 05-18 15:00:23] {2442} INFO - at 0.2s, estimator lgbm's best error=12968.4586, best estimator lgbm's best error=12968.4586
[flaml.automl.logger: 05-18 15:00:23] {2258} INFO - iteration 1, current learner lgbm
[flaml.automl.logger: 05-18 15:00:23] {2442} INFO - at 0.4s, estimator lgbm's best error=12968.4586, best estimator lgbm's best error=12968.4586
[flaml.automl.logger: 05-18 15:00:23] {2258} INFO - iteration 2, current learner lgbm
[flaml.automl.logger: 05-18 15:00:24] {2442} INFO - at 0.6s, estimator lgbm's best error=9488.0927, best estimator lgbm's best error=9488.0927
[flaml.automl.logger: 05-18 15:00:24] {2258} INFO - iteration 3, current learner sgd
[flaml.automl.logger: 05-18 15:00:25] {2442} INFO - at 2.0s, estimator sgd's best error=26664.1568, best estimator lgbm's best error=9488.0927
[flaml.automl.logger: 05-18 15:00:25] {2258} INFO - iteration 4, current learner xgboost
[flaml.automl.logger: 05-18 15:00:25] {2442} INFO - at 2.2s, estimator xgboost's best error=13194.6180, best estimator lgbm's best error=9488.0927
[flaml.automl.logger: 05-18 15:00:25] {2258} INFO - iteration 5, current learner lgbm
[flaml.automl.logger: 05-18 15:00:26] {2442} INFO - at 2.6s, estimator lgbm's best error=5322.7602, best estimator lgbm's best error=5322.7602
[flaml.automl.logger: 05-18 15:00:26] {2258} INFO - iteration 6, current learner lgbm
[flaml.automl.logger: 05-18 15:00:26] {2442} INFO - at 2.8s, estimator lgbm's best error=5322.7602, best estimator lgbm's best error=5322.7602
[flaml.automl.logger: 05-18 15:00:26] {2258} INFO - iteration 7, current learner lgbm
[flaml.automl.logger: 05-18 15:00:26] {2442} INFO - at 3.2s, estimator lgbm's best error=4969.1631, best estimator lgbm's best error=4969.1631
[flaml.automl.logger: 05-18 15:00:26] {2258} INFO - iteration 8, current learner lgbm
[flaml.automl.logger: 05-18 15:00:27] {2442} INFO - at 3.6s, estimator lgbm's best error=4969.1631, best estimator lgbm's best error=4969.1631
[flaml.automl.logger: 05-18 15:00:27] {2258} INFO - iteration 9, current learner lgbm
[flaml.automl.logger: 05-18 15:00:27] {2442} INFO - at 3.8s, estimator lgbm's best error=4969.1631, best estimator lgbm's best error=4969.1631
[flaml.automl.logger: 05-18 15:00:27] {2258} INFO - iteration 10, current learner xgboost
[flaml.automl.logger: 05-18 15:00:27] {2442} INFO - at 4.0s, estimator xgboost's best error=13194.6180, best estimator lgbm's best error=4969.1631
[flaml.automl.logger: 05-18 15:00:27] {2258} INFO - iteration 11, current learner extra_tree
[flaml.automl.logger: 05-18 15:00:27] {2442} INFO - at 4.4s, estimator extra_tree's best error=11698.9242, best estimator lgbm's best error=4969.1631
[flaml.automl.logger: 05-18 15:00:27] {2258} INFO - iteration 12, current learner rf
[flaml.automl.logger: 05-18 15:00:28] {2442} INFO - at 4.8s, estimator rf's best error=10181.3899, best estimator lgbm's best error=4969.1631
[flaml.automl.logger: 05-18 15:00:28] {2258} INFO - iteration 13, current learner rf
[flaml.automl.logger: 05-18 15:00:28] {2442} INFO - at 5.1s, estimator rf's best error=7248.2128, best estimator lgbm's best error=4969.1631
[flaml.automl.logger: 05-18 15:00:28] {2258} INFO - iteration 14, current learner xgboost
[flaml.automl.logger: 05-18 15:00:28] {2442} INFO - at 5.3s, estimator xgboost's best error=10198.0304, best estimator lgbm's best error=4969.1631
[flaml.automl.logger: 05-18 15:00:28] {2258} INFO - iteration 15, current learner rf
[flaml.automl.logger: 05-18 15:00:29] {2442} INFO - at 5.6s, estimator rf's best error=7248.2128, best estimator lgbm's best error=4969.1631
[flaml.automl.logger: 05-18 15:00:29] {2258} INFO - iteration 16, current learner extra_tree
[flaml.automl.logger: 05-18 15:00:29] {2442} INFO - at 6.0s, estimator extra_tree's best error=8654.0612, best estimator lgbm's best error=4969.1631
[flaml.automl.logger: 05-18 15:00:29] {2258} INFO - iteration 17, current learner lgbm
[flaml.automl.logger: 05-18 15:00:30] {2442} INFO - at 7.0s, estimator lgbm's best error=4307.6538, best estimator lgbm's best error=4307.6538
[flaml.automl.logger: 05-18 15:00:30] {2258} INFO - iteration 18, current learner sgd
[flaml.automl.logger: 05-18 15:00:31] {2442} INFO - at 8.3s, estimator sgd's best error=23768.6052, best estimator lgbm's best error=4307.6538
[flaml.automl.logger: 05-18 15:00:31] {2258} INFO - iteration 19, current learner lgbm
[flaml.automl.logger: 05-18 15:00:32] {2442} INFO - at 8.7s, estimator lgbm's best error=4307.6538, best estimator lgbm's best error=4307.6538
[flaml.automl.logger: 05-18 15:00:32] {2258} INFO - iteration 20, current learner xgboost
[flaml.automl.logger: 05-18 15:00:32] {2442} INFO - at 8.9s, estimator xgboost's best error=7947.7516, best estimator lgbm's best error=4307.6538
[flaml.automl.logger: 05-18 15:00:32] {2258} INFO - iteration 21, current learner xgboost
[flaml.automl.logger: 05-18 15:00:32] {2442} INFO - at 9.2s, estimator xgboost's best error=7947.7516, best estimator lgbm's best error=4307.6538
[flaml.automl.logger: 05-18 15:00:32] {2258} INFO - iteration 22, current learner xgboost
[flaml.automl.logger: 05-18 15:00:33] {2442} INFO - at 9.6s, estimator xgboost's best error=7947.7516, best estimator lgbm's best error=4307.6538
[flaml.automl.logger: 05-18 15:00:33] {2258} INFO - iteration 23, current learner lgbm
[flaml.automl.logger: 05-18 15:00:37] {2442} INFO - at 14.0s, estimator lgbm's best error=3327.8433, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:00:37] {2258} INFO - iteration 24, current learner rf
[flaml.automl.logger: 05-18 15:00:38] {2442} INFO - at 14.7s, estimator rf's best error=5667.3382, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:00:38] {2258} INFO - iteration 25, current learner extra_tree
[flaml.automl.logger: 05-18 15:00:38] {2442} INFO - at 15.2s, estimator extra_tree's best error=8654.0612, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:00:38] {2258} INFO - iteration 26, current learner sgd
[flaml.automl.logger: 05-18 15:00:40] {2442} INFO - at 17.2s, estimator sgd's best error=19930.3228, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:00:40] {2258} INFO - iteration 27, current learner rf
[flaml.automl.logger: 05-18 15:00:41] {2442} INFO - at 17.7s, estimator rf's best error=4587.0558, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:00:41] {2258} INFO - iteration 28, current learner xgboost
[flaml.automl.logger: 05-18 15:00:41] {2442} INFO - at 18.1s, estimator xgboost's best error=6696.1709, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:00:41] {2258} INFO - iteration 29, current learner extra_tree
[flaml.automl.logger: 05-18 15:00:42] {2442} INFO - at 18.5s, estimator extra_tree's best error=6623.2201, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:00:42] {2258} INFO - iteration 30, current learner extra_tree
[flaml.automl.logger: 05-18 15:00:42] {2442} INFO - at 19.0s, estimator extra_tree's best error=5157.7198, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:00:42] {2258} INFO - iteration 31, current learner lgbm
[flaml.automl.logger: 05-18 15:00:47] {2442} INFO - at 24.1s, estimator lgbm's best error=3327.8433, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:00:47] {2258} INFO - iteration 32, current learner lgbm
[flaml.automl.logger: 05-18 15:00:50] {2442} INFO - at 27.2s, estimator lgbm's best error=3327.8433, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:00:50] {2258} INFO - iteration 33, current learner rf
[flaml.automl.logger: 05-18 15:00:51] {2442} INFO - at 28.0s, estimator rf's best error=4587.0558, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:00:51] {2258} INFO - iteration 34, current learner extra_tree
[flaml.automl.logger: 05-18 15:00:52] {2442} INFO - at 28.5s, estimator extra_tree's best error=5157.7198, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:00:52] {2258} INFO - iteration 35, current learner lgbm
[flaml.automl.logger: 05-18 15:00:58] {2442} INFO - at 35.1s, estimator lgbm's best error=3327.8433, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:00:58] {2258} INFO - iteration 36, current learner xgboost
[flaml.automl.logger: 05-18 15:00:59] {2442} INFO - at 35.9s, estimator xgboost's best error=5984.3581, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:00:59] {2258} INFO - iteration 37, current learner sgd
[flaml.automl.logger: 05-18 15:01:00] {2442} INFO - at 37.4s, estimator sgd's best error=19930.3228, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:01:00] {2258} INFO - iteration 38, current learner xgboost
[flaml.automl.logger: 05-18 15:01:01] {2442} INFO - at 37.8s, estimator xgboost's best error=5984.3581, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:01:01] {2258} INFO - iteration 39, current learner extra_tree
[flaml.automl.logger: 05-18 15:01:01] {2442} INFO - at 38.4s, estimator extra_tree's best error=4253.3061, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:01:01] {2258} INFO - iteration 40, current learner rf
[flaml.automl.logger: 05-18 15:01:02] {2442} INFO - at 39.1s, estimator rf's best error=4208.3611, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:01:02] {2258} INFO - iteration 41, current learner extra_tree
[flaml.automl.logger: 05-18 15:01:03] {2442} INFO - at 39.7s, estimator extra_tree's best error=4253.3061, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:01:03] {2258} INFO - iteration 42, current learner catboost
[flaml.automl.logger: 05-18 15:02:38] {2442} INFO - at 135.1s, estimator catboost's best error=3340.2819, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:02:38] {2258} INFO - iteration 43, current learner extra_tree
[flaml.automl.logger: 05-18 15:02:39] {2442} INFO - at 135.6s, estimator extra_tree's best error=3851.9675, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:02:39] {2258} INFO - iteration 44, current learner lgbm
[flaml.automl.logger: 05-18 15:02:43] {2442} INFO - at 140.3s, estimator lgbm's best error=3327.8433, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:02:43] {2258} INFO - iteration 45, current learner extra_tree
[flaml.automl.logger: 05-18 15:02:44] {2442} INFO - at 141.0s, estimator extra_tree's best error=3851.9675, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:02:44] {2258} INFO - iteration 46, current learner catboost
[flaml.automl.logger: 05-18 15:04:32] {2442} INFO - at 248.8s, estimator catboost's best error=3340.2819, best estimator lgbm's best error=3327.8433
[flaml.automl.logger: 05-18 15:04:33] {2685} INFO - retrain lgbm for 0.9s
[flaml.automl.logger: 05-18 15:04:33] {2688} INFO - retrained model: LGBMRegressor(colsample_bytree=0.6649148062238498,
learning_rate=0.17402065726724145, max_bin=255,
min_child_samples=3, n_estimators=93, n_jobs=-1, num_leaves=15,
reg_alpha=0.0009765625, reg_lambda=0.006761362450996489,
verbose=-1)
[flaml.automl.logger: 05-18 15:04:33] {1985} INFO - fit succeeded
[flaml.automl.logger: 05-18 15:04:33] {1986} INFO - Time taken to find the best model: 14.040556192398071
Test RMSE: 3459.7530968991273
Test R2: 0.9591996557531124
Best Parameters:
{'n_estimators': 93, 'num_leaves': 15, 'min_child_samples': 3, 'learning_rate': 0.17402065726724145, 'log_max_bin': 8, 'colsample_bytree': 0.6649148062238498, 'reg_alpha': 0.0009765625, 'reg_lambda': 0.006761362450996489}
Best CV Score: 3327.8432720196106
FLAML is well-suited for quick AutoML tasks on small- to medium-sized datasets. Compared to tools like Optuna (which offers flexible search and pruning) and BayesSearchCV (which integrates tightly with scikit-learn), FLAML prioritizes speed, efficiency, and minimal configuration.
In contrast, there are also fully-managed cloud AutoML services, which handle the entire pipeline from data preprocessing to deployment. These services are convenient but come with usage costs.
Fully-Managed Cloud AutoML Services
| Platform | Product Name |
|---|---|
| GCP | Vertex AI AutoML |
| AWS | SageMaker Autopilot |
⚠️ Note: These cloud-based AutoML platforms are not free — you pay for compute, storage, and usage time.
CV-based hyperparameter tuning often leads to better model performance than generic cloud AutoML — especially when you know what you’re doing.
12.7 Resources
Optuna paper
Optuna github repo
scikit-optimize offcial website
scikit-optimize github repo
FLAML GitHub
Official Docs
AutoML Benchmark Paper